\section{Conclusion}
\label{sec:conclusion}

We have presented a parallel implementation of particle simulation written using CUDA and tested this implementation on a GPU.  Our approach relied on binning to achieve speedup over the naive parallel approach.  As starting points, we used a $O(n)$ serial implementation that demonstrated the binning approach as well as the naive $O(n^2)$ parallel CUDA implementation.  We were able to achieve significant speedup over the naive parallel case with our binning approach.  In particular, for simulation sizes greater than 10000 particles, we achieve a greater than two orders of magnitude improvement over the naive case.